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Abstract 

This paper addresses the issue of phase noise in OFDM systems. Phase noise (PHN) is a transceiver 
impairment resuhing from the non-idealities of the local oscillator We present a case for designing a 
turbo receiver for systems corrupted by phase noise by taking a closer look at the effects of the common 
phase error (CPE). Using an approximate probabilistic framework called variational inference (VI), we 
develop a soft-in soft-out (SISO) algorithm that generates posterior bit-level soft estimates while taking 
into account the effect of phase noise. The algorithm also provides an estimate of the phase noise 
sequence. Using this SISO algorithm, a turbo receiver is designed by passing soft information between 
the SISO detector and an outer forward error correcting (FEC) decoder that uses a soft decoding 
algorithm. It is shown that the turbo receiver achieves close to optimal performance. 

I. Introduction 

Increase in the demand for higher data rates has led to OFDM becoming the technology of 
choice for next generation wireless standards such as WiMAX and LTE. The deployment of 
the devices built around these standards brings to light the various implementation issues that 
become critical when designing an OFDM based system. PHN is an impairment that is quite 
different from the other impairments since it is a continuous time noise process that cannot be 
compensated for in the training stage. In this paper we consider the problem of data detection 
in an OFDM system that has been corrupted by PHN. 
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PHN arises from imperfections in the local oscillator (LO) that result in a random drift of the 
LO phase from its reference and is commonly modelled as a Wiener process. But, in general, 
a local oscillator is usually followed by a phase locked loop (PLL) that keeps the phase and 
frequency matched to a reference and in such a scenario PHN is better modelled as a wide sense 
stationary (WSS) process with bounded variance [1|. In this paper we assume phase noise to be 
a first order auto-regressive (AR(1)) process as suggested in ||2| for the IEEE 802.1 Ig standard. 
The effect of PHN has been studied extensively [|3|-[[6|. The effect of PHN can be split into 
two: the rotation of all the sub-carriers by a certain angle called the common phase error (CPE) 
and the leakage of the neighboring sub-carriers resulting in ICI. CPE is the average of the PHN 
sequence spanning an OFDM symbol. 

PHN mitigation for single carrier systems and multi carrier systems such as OFDM has been 
studied extensively [j5|, ||7|-|jTT|. While for single carrier systems it is possible to construct the 
factor graph and adopt a message passing algorithm [ 11| such an approach becomes prohibitively 
complicated for OFDM systems. Most algorithms suggested for PHN estimation in OFDM 
systems involve a decision directed process where the data symbols are estimated ignoring 
PHN and these decisions are fed back to estimate the PHN. This process is iterated either for 
a fixed number of times or until convergence. These are hard decision algorithms and do not 
provide any soft information on the decisions made. Another class of techniques that have been 
used to jointly estimate PHN and data symbols include the Markov chain Monte Carlo methods 
[|8|. The major concern with decision directed algorithms is the assumption that the initial crude 
estimation of data symbols ignoring PHN results in a majority of the symbols being detected 
correctly while a good fraction of the remaining symbols get corrected over the subsequent 
iterations. As we will see in a subsequent section, this need not necessarily be true and can lead 
to a premature error floor. This motivates the need for a turbo receiver that exchanges messages 
between the symbol detector and the outer EEC decoder. 

While [7 1 developed algorithms for PHN mitigation using VI that generated soft symbol 
estimates using a Gaussian approximation, we design a soft detection algorithm that computes 
soft bit estimates of the transmitted bits using the VI framework and a discrete distribution for 
the bits. The algorithm is capable of incorporating prior information on bits and hence fits in 
naturally with a soft decoding algorithm for the EEC code. The paper is organized as follows. 
Section |ll] reviews modelling of phase noise, section III sets up the received signal model and 
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discusses the consequences of phase noise on the received signal. In section lIV] the VI framework 



for this problem is presented along with the bit level detection algorithm. The last section presents 
the simulation results. 

II. Phase Noise Characteristics 

Since almost all communication systems employ an LO followed by a PLL, we assume PHN 
to be a WSS process. In particular, we model it as an AR(1) process. Such a model has been 
adopted by IEEE standards such as 802.1 Ig where phase noise is modelled as the output of a 
single pole Butterworth filter driven by a zero mean white noise process. Such a process has an 
autocorrelation function of the form [|7| 

i?p(A:) = a,V2-^°^=l^l. (1) 

Here, l/T^ is the sampling rate and fio is the one sided 3-dB bandwidth of the oscillator, ag is 
the RMS value of the PHN process in radians, a typical value being around 3°. In general, the 
3-dB bandwidth of an oscillator tends to be in the range of hundreds of kHz. One can write the 
covariance matrix of a length-N sequence of PHN as 
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1 P ... ^ 
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P 



pN-1 pN~2 pN-3 



(2) 



where, p = e ^Tf^oTs Usually, the cut off frequency of the Butterworth filter (Qo) is set to the 
3-dB bandwidth of the oscillator. 



A. Sample mean statistics 

Since the CPE plays a critical role in the detection of symbols in affected by PHN, we take 
a closer look at the sample mean of a length-N PHN sequence. Suppose we let denote the 
sample mean of a length-N sequence of the phase noise process, we have. 



N 



(3) 



k=l 



Since the statistics of the phase noise process are assumed to be known, it can be shown that 6 
is a zero mean Gaussian random variable with variance l^^l/N"^ [12]. Note that the variance 
of the sample mean is a function of N and goes to zero as — t- oo. 
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III. Received Signal Model 

A. Gray mapping : bits to M-QAM symbols ( from [13] ) 

We assume M to be a perfect square. Let L denote log2M and let B E {—1, 1}^^^ represent 
a matrix of NxL bits that need to be mapped to N M-QAM symbols. Denote the columns of B 
as hri, hr2, ■ ■ ■ brL/25bji, bj2 • • • bii/2. If the resulting vector of M-QAM symbols is represented 
as d, then, from [I3j , we have the following relation : 

L/2 p=L/2 L p=L 

d = E2'-^ H E 2'-^ H bp (4) 

1=1 p=l l=L/2+l p=L/2+l 

Here, ]/[ represents element wise product of the vectors. Let the function/ denote the mapping 
from bits to symbols. Thus, d =/(B). Further, denote the matrix [b,.i b,.2 • • • bf.L/2] as B,., and 
the matrix [bji bi2 • • • hiL/2] as B^ and define the functions/,, and/j as : 

/,(B,)=j.53(/(B)) (5) 
/,(B,) = ^(f{B)) (6) 

Hence, d=/(B)=/,(B,)+/,(B,). 

B. The received signal 

In this work, we consider the detection of an OFDM symbol transmitted over a block fading 
frequency selective channel, where the channel stays constant over the duration of one OFDM 
symbol. We also assume that perfect frame synchronization, including carrier frequency recovery 
have been established in the training stage. We further assume that current channel conditions 
have been estimated during the training phase and that channel state information is available on 
the receiver side. Algorithms that can estimate the channel in the presence of PHN and carrier 
frequency offset have been presented in |[8|, p2| . In the data detection stage we assume that the 
received symbol vector has been affected by PHN in addition to the channel and the additive 
noise. The received signal for such a scenario in the discrete domain after appropriate sampling 
and removal of the cyclic prefix is given by 



r = PF^Hd + n = PF^H/(B) + n. 



(7) 
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Here, F is an A^x DFT matrix with the (1, m)th entry given by = (1 /v/iV)e~(2u^('-^)(™-^)/^), 
P is the diagonal matrix given by diag(e^^) ^ diag(l + jO), where is the PHN sequence and 
H = diag(h) is the channel matrix in the frequency domain and the N x L binary matrix B 
contains the transmitted bit sequence (d is the corresponding symbol sequence), n is complex 
white Gaussian noise with variance cr^ per dimension. 

Detection schemes that ignore PHN involve computing the DFT of the received signal and then 
adopting MMSE or zero forcing detection on individual sub carriers. The DFT of the received 
signal in the presence of phase noise results in a vector R = [RqRi . . . Rn-i]'^ which can be 
written as [|7| 

Af-l 

Rk = Codkhk + dlhlC(^i^k)rnodN + ^k- (8) 

l=0,l^k 

Here, the vector c = [cqCi . . . cn-i]'^ is given by (l/-\/iV)Fp (where p = e^^) i.e. the frequency 
domain representation of the PHN sequence. It can be shown that u is an uncorrected white 
noise process with z/^ ~ CA/'(0, 2cr^). Eq.([8]) clearly illustrates how PHN affects the received 
signal. Note that cq is the CPE i.e ^ + j^^^dk (under small angle assumption), and its effect 
is to rotate every received symbol by the average phase angle 9. The next subsection further 
elaborates the consequences. 

C. Effect of phase noise on the received signal and its consequences 

Decision directed feedback mechanisms that have been suggested to counter the ICI resulting 
from PHN involve an initial step where the symbols are detected in a crude manner, ignoring 
the effects of PHN. The underlying assumption is that a majority of symbols detected in this 
way are correct and hence a back substitution of these symbols into the received signal structure 
must aid the detection process. This need not be true for certain realizations of the phase noise, 
specifically, scenarios where the CPE is high. This is because the received symbol gets rotated 
by an angle equal to the CPE and if this angle is greater than the average angular separation 
between the adjacent symbols of the constellation, the symbol is likely to be detected in error. As 
an example, for the 64-QAM constellation a rotation of 9° causes a symbol error with probability 
0.43 in the absence of noise. When the oscillator linewidth is set to lOOkHz and the RMS value 
of phase noise is set to 3°, the probability of encountering a length 64 (assume 64 sub-carriers 
in one OFDM symbol) phase noise sequence whose mean is greater than 9° is 4 x 10^^. This 
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leads to a premature error floor. 

One can counter this by either embedding pilots in the OFDM symbol and using them to 
estimate and compensate for CPE [[5| or by designing a blind turbo receiver. While embedding 
pilots is a common practice to aid channel estimation, this comes at the cost of spectral efficiency. 
In scenarios where embedding pilots in every OFDM symbol is not an option one needs to 
consider designing a turbo receiver. Since the variance of the CPE decreases with increasing 
length of the PHN sequence, there exists a compelling case to design a turbo receiver for short 
length OFDM symbols. Having made the case for joint detection-decoding, we design a bit level 
detection algorithm that generates soft bit estimates and use it to set up a turbo receiver. The 
next subsection explains the basic principles involved. 

D. Actual and postulated posterior distributions 

From [|7|, we note that by approximating e^^ to 1+jO under the small angle assumption, one 
can write the conditional distribution of r given B and to be : 

p(r|B, 6) = CAr(diag(PF^H/(B))(l + j6), 2a'^l) (9) 

Further, in [7| it was shown that the optimal detector (i.e. ML estimate of d) for such a 
signal model, given the prior distribution of the phase noise sequence has an exponential 
complexity in N. Given that the distribution ]5(r|B) and consequently the posterior distribution 
p(B|r) do not lend themselves to efficient optimal detection, we look at an approximation to 
]9(B|r), such that, computing the optimal estimate corresponding to the approximated distribution 
is straightforward. To this end, we note that p(B|r) is computed by marginalizing p(B,0|r) with 
respect to 6, i.e. 

p(B|r) = J p(B,0\r)d0 (10) 

The variational inference approach approximates p(B,0|r) with a function Q(B,0) of the form 
Qb(B)Q0(6>) such that 

Q(B,0)~p(B,0|r) (11) 

This kind of an approximation is equivalent to assuming that B and are independent condi- 
tioned on r and this in turn implies the maximizer of the distribution Qb(B) is the optimal estimate 



of B. We further assume that the distribution Qb(B) can be factorize into Y[n=i YliLiQbibni)- 
Such a factorization for the postulated posterior distribution of the bits, where the distribution is 
assumed to be independent over n and /, is commonly known as the mean field approximation. 
In this paper, we assume Qbibni) to be a Bernoulli distribution with parameter A^^. We postulate 
the posterior conditional distribution of phase noise to be a multi-variate Gaussian distribution 
with mean mo and covariance Sg. Thus, we have 



Q(B,0) = Qb(B)Q,(0) 

l + 6„i '--Onl 

^ Afime,Se) (12) 



n=l 1=1 
■ N L 

nn 



.n=l 1=1 

2 ( i-Ki" 



M{me,Se) (13) 

_n=l 1=1 

where, hni = 2\ni — 1, is the mean of the postulated posterior distribution Qb{bni)- Having fixed 
the structure of the postulated posterior distribution, it remains now to compute the parameters of 
this distribution such that it closely approximates the actual posterior distribution. The variational 
inference approach introduces the concept of variational free energy as a measure of similarity 
between two distributions. The variational free energy between the two distributions of interest 
here is given by 

F{Q,p)= [ Q{B,e)\og ^^'f\ dBde. (14) 

We use the distribution p(B,0,r) instead of p(B,0|r) as they are related simply through a 
constant of proportionality. It is to be noted that the free energy expression is exactly equal to the 
KuUback-Leibler divergence between Q(B,0) and p(B,0\r) to within an additive constant. The 
variational inference approach involves the minimization of the free energy over the parameters 
of Q(B, 6) so that the resulting distribution closely approximates the actual posterior distribution. 

The actual posterior distribution can be computed by conditioning over the unknowns as 
follows: 

p(B,0,r)=p(r|B,0)p(B)p(0) (15) 

Using and assuming independent Bernoulli distributed priors on bits with the l*'^ bit of the 
^th syjjibol having mean fini (set fini to if no prior information is available) and assuming 
the prior distribution of PHN to be a multi-variate Gaussian distribution with mean fig and 
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covariance matrix 4>g (if no prior information is available, set mean to and covariance to $ 
as given in (|2])), we can write the posterior distribution as 



p(B, e, r) = CAr(diag(PF^H/(B))(l + jO), 2(7^1) 



N L 



n=l 1=1 



l-b„ 



(16) 



The next section discusses the computation and minimization of the free energy expression. 

IV. The bit-level Variational Inference Algorithm 
A. Free Energy Evaluation 



The free energy expression given in (14) can be written as the summation of five terms as 



shown in ( 17 1 



F{Q,p)=- / Q(6/) logp(B)dB- / Q{0)\ogp{e)de + / Q(B) logp(B)(iB 



+ / Q{e)\ogQ{e)de - / Q{^)Q{0)\og{p{r\^,e))<mde 



(17) 



B,0 



The exact computation of the free energy expression is given in Appendix |A] and the final 
expression is presented here. Note that the free energy expression is parameterized by the mean 
and the covariance matrix of the PHN sequence and the mean value of the posterior estimate 
of the bits. Note that when treating the unknown B as a matrix of Bernoulli random variables, 
we denote B as the matrix of the means (6„/) of the corresponding random variables in B. We 
define B^, Bj, b,., and hrk in a similar fashion. 



N L 



F(m,,S,,B) = - EE I (4^) {^) + {^) log (i^)j+^tr(0,-%) + V^.^m, 



n=l 1=1 



N L 



- M^0-m, + E E I (^) (^) + (^) 
1 



+ 



2a2 



n=l 1=1 

r^Z/(B) -/(B)"Z^r +/,(B,)"Mq/-,(B,) +/,(B,)"Mq/-,(B,) 



- ^log|S6 



+/,(B,)"Mi/,(B,) +/,(Bi)"Mi/,(Bi) + l^Maiv. + l^'M^u 



(18) 
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B. Free Energy Minimization 

Clearly, closed form expressions for the optimal parameters that minimize the free energy 
expression cannot be computed. Hence, we adopt a coordinate-descent approach to the mini- 
mization, wherein, one parameter is updated while keeping the others constant. Such an approach 
will converge to a local minimum. The update to each parameter is obtained by computing the 



gradient of the free energy expression given in (18) w.r.t the parameter and setting it to zero. 
This leads to three update equations, one each for the mean and covariance matrix of the phase 
noise and one for updating the mean corresponding to the bits. The update equations are : 



mo 



-Im(r^X„ 



'■brk 



diag(afc)K{Z^r} - diag(5fc)Mf 1 + diag(afc) (jS{Mo}/,(B,) - 5R{M^}/,(B, 



diag(/3,.)S{Z^r} - diag(nfe)Mf 1 + diag(/3fe) (3{Mo}/,(B,) + j3?{M^}/,;(B,) 



(19) 



(20) 



(21) 



(22) 



In the above equations, tg^^ and tg.^ represent tanh~^(brfc) and tanh~^(bjfc) (computed element 
wise) respectively, while t^„.fc and i^ik represent tanh-^(^i^^) and tanh~^(^ij^) respectively, with 
l-ij.). and ^ijfc defined analogous to b^fe and hik but with prior means. Further, the derivatives 
and are denoted as diag(Q;fe) and diag(/3fc). The derivatives ^ and ^ are 

denoted as diag(5fc) and diag(r2fc). Detailed derivation of the update equations is given in 
Appendix |B} 



C. The Variational Inference Algorithm 

The pseudo code given in the next page gives the steps involved in the VI based soft bit 
detection algorithm. It was noted that because of the the coordinate descent approach to free 
energy minimization, the algorithm converges to a local minimum and not the global minimum. 
This was particularly found to be an issue in scenarios of high CPE. To overcome this, we use 



the second term of (17) (denoted as F2 henceforth) as an indicator of convergence to the right 
PHN sequence. Whenever F2 is greater than a certain threshold, it indicates that the estimated 
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Algorithm 1 Bit level Variational Inference Algorithm 



1: Initialize, ^ini ^ or given priors; ^ig ^ 0; 4>g ^ nini ^ 0; mg ^ 0; Sg ^ 0; 

2: Compute t^rfc and t^jfc for /c G {1, 2, . . . ^}. 

3: for s=l:nu'm_iter do 

4: Compute Ur, l^i, OLk, 5k, X^. 
5 
6 
7 



9: 
10: 
11: 
12: 



18: 



Update Se using (19). 



Update mg using (20). 
for k=L/2:l do 



Compute tg^;, and t^.^^ using ([21]) and ([22]). 
for p=l:L/2 do 

Update B, a^, (3^, 6p, Qp. 
end for 
end for 



13: end for 



14: Compute F2 in ( [TT] ). 
15: if F2 > threshold then 

Ignore PHN and detect symbols. 



16 



17: else 



Return 2(t 



'■/.irk J 



& 2rt 



W) for A;G {l,2,...f}. 



19: end if 



PHN sequence does not correspond to the expected statistics and suggests that the algorithm has 
converged to a wrong sequence. In such scenarios, we ignore the output of the algorithm and 
compute the soft bit estimates ignoring PHN. 

V. Simulation results 

To test the performance of the suggested algorithm we set up the following simulation. We 
simulated a link using 64-QAM constellation and OFDM with 64 sub-carriers over a frequency 
selective channel. The channel was assumed to be a Rayleigh multipath fading channel with 10 
taps and an exponential power delay profile. The sampling rate was set to 20 MHz, which in 
turn implies a sub-carrier spacing of 312.5 kHz. The oscillator bandwidth was set to 100 kHz 
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- * - VI Turbo iter 1 

- ^ - VI Turbo iter 2 

- V - VI Turbo iter 3 
^ — Ignoring phase noise 

No pinase noise 
^ — One pass detection-decoding 




14 15 16 
Eb/No (in dB) 



Fig. 1 : BER plot of the turbo receiver. 



and the standard deviation ag was set to 3°. The MATLAB code presented in [j2| was used to 
generate the PHN sequences. 

For the turbo receiver setup, an outer LDPC code of rate 3/4 and length 2304 was used. On 
the transmit side, the message bits were encoded, interleaved and mapped to symbols from the 
64-QAM constellation. The length of the outer code was chosen so as to span 6 OFDM symbols. 
On the receiver side, after the reception of the 6 OFDM symbols, each was passed through the 
bit level detection algorithm and the extrinsic information so obtained was passed to the soft 
decoder after deinterleaving. To begin with, the detector was initialized to uniform priors; in 
subsequent iterations, the extrinsic information obtained from the decoder was used as priors. 
For this particular simulation, we used 3 outer iterations with 6 iterations for the decoder and 5 
iterations for the detector. Fig. [T] presents the results of this simulation and shows a significant 
improvement in performance with every subsequent iteration of the turbo receiver. It can also 
be seen that the turbo receiver clearly out-performs a one pass receiver where there is no loop 
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back between the detector and decoder. The receiver performance is compared against scenarios 
where there is no PHN and where PHN is completely ignored. For both these cases, and for the 
one pass setup, the FEC decoder was run for 18 iterations. 

VI. Conclusion 

In this paper we highlighted the consequences of high CPE and its effect on decision directed 
algorithms. To overcome this effect, we developed a bit-level soft detection algorithm using the 
VI approach and used this algorithm to build a turbo receiver. Through simulations we have 
established the performance gains that can be achieved using the turbo receiver. 

Appendix A 
Evaluation of the free energy expression 



As shown in (17), the free energy expression can be written as the sum of five terms. Since 
we have adopted a discrete distribution for Qs(B), it is straightforward to compute the first 
and third terms. The second and fourth term together constitute the KuUback-Leibler divergence 
between two multi-variate Gaussian distributions and can be computed easily. We discuss the 
computation of the fifth term here. Now, 



-2a'{v) =const. + I Q(B)Q(6>)(r - PF^H/(B))^(r - PF^H/(B)) 



dBdO (23) 



Proceeding exactly as in fT|, we can simplify ([23]) to 

dB 



-2a\v) = const. + / Q(B) [/(B)^Z^Z/(B) - r^Z/(B) -/(B)^Z^r +/(B)^*/(B) 

(24) 

Here, we have defined Z = diag(l + jme)F'^H and * = H^Fdiag(S0)F^H). Using Lemmas 5 



and 6 from [13|, we can simplify this to 



-2a\v) =const. - r^Z/(B) -/(B)''z^r + l^(Mi)(i/, + ly^) +/.(B,)''(Mo)/.(B,) 

+MBifiMo)MBr) +/.(B,)''(M2)/.(B,) +f,{BifiM2)MBi). (25) 



In the equation above, we have used the following definitions. 

Mo =(* + Z^Z) 

Ml =(* + Z^Z - diag(* + Z^Z)) 
M2 = diag(* + Z^Z) 

p=j L/2 
0<i<j<(i/2) p=i 1=1 

0<j<j<(i/2) p=i i=l 

Appendix B 
Computing the gradient 

A. Gradient w.r.t Se 



Using ( |T8| ) and ( [14] ), we can write 

dF d{—i — ii + in + iv — v) 



Using results from |14|, one can compute the terms above to be : 

d{iii) 



dii) 



dS 



-1 



0; 



d{ii) 



dS 



-1 



dS 



-1 



0: 



d{iv) 



To compute ^^t, note the following two results [|7| 



(a) 



(b) 



a(x^diag(Sg)y) _ d (tr(diag(x^)Sediag(y))) 

^ a(tr(diag(y)diag(x^)Sg)) 
dSg' 

= - (diag(y)diag(x^)) 
9(tr(Xdiag(Se))) 9(tr(diag(X)Se)) 



dS 



-1 



-S^(diag(X))S^ 
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In the above equations, by diag(S0) we mean the diagonal matrix formed using the diagonal 



entries of the matrix Sq. Using (|33|) and (|34|) we can compute to be 



d{v) _-l 



- (X„X^)S^ - S^ diag(F^Hdiag(i/, + - l/(B)nH^F)S^ 



7H 



2\TT-ff"I7\C!r 



Using (32) and (35) it is straightforward to compute (19). 



(35) 



B. Gradient w.r.t mg 



Using (17) and (18 



we can write 

dF d{—i — ii + in + iv — v) 



d niQ dmg 
Using results from p4| , one can compute the terms above to be 



d{t) _ ^ d{ii) 



-T ,-1 d{m) d{iv) 



4>e tJ-e - 4>e ^e] 



0; 



0; 



(36) 



(37) 



dme dme dme dvae 

To compute note the following two results. For any two given vectors x and y, using 
results from [14], we have : 

(a) 

d (x^(I + idiag(me))^(I + jdiag(me))y) 8 ((1 + jme)^diag(x^)diag(y)(l + ]vae)) 



dmf, 



- 2diag(x^)diag(y)me 



(38) 



(b) 



d (x^diag(Z^Z)y) d (tr(diag(x^)Z^Zdiag(y))) 



dme 



dme 

d{{\+ jmg)^diag(F^Hdiag(y)diag(x^)H^F)(l - jmg)) 

dme 



=2diag(F^Hdiag(y)diag(x^)H^F)m9 
Using (38) and (39), we can compute |^ to be : 

1^ =^ X^X^ine + 3(r^X^)^ + diag(F^Hdiag(z/, + z/i- l/(B)n)me 



(39) 



(40) 



Using (37), (40) and the update equation for Sg given in (19), we can compute the update 



equation for me to be as given in (20). 
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C. Gradient w.r.t brk cind bik 

dF 



To compute 



dhrk 

N L 
n=l 1=1 



consider the term (z) from (|17|): 



Differentiating (41) w.r.t to hni gives tanh ^{^ni)- Thus, if we denote as i^rk and 



(41) 



d(i) 



as t^jfc for k e {1,2,... f}, we have i^rk = tanh ^Mrfe) and t^i^ = tanh ^Mifc)- Similarly, 

djii) 

dhrk 



differentiating term (ii) (given in (42)) w.r.t to 6„; gives tanh ^(6„/). Denoting as t^,^^ and 



djii) 
dbik 



as tr., , we have tr , = tanh ^(b^^) and tr = tanh ^(hik). 



[It 



brk 
N L 

EE 

n=l 1=1 



log 



log 



(42) 



Clearly, the terms (Hi) and (iv) of 17 are independent of b^A; and hik- It remains to compute 
the derivative of the fifth term. To compute and we note the following : 



(a) From [|T3J, 



dhrk 

(b) From [[T3|, 



diag(Q;fc) where, (Xk 



k L/2 

l(A; = L/2)2^-\l + ^2'-i JJ b,^ 

P = l,Py^k 



(43) 



z=i 



ab 



k L/2 

diag(/3,) where, /3, = j. = L/2)2^-M + 2'-^ JJ b.^ 

1=1 p=Lpj^k 



(44) 



(c) For any given matrix M with complex entries, 

a/,(B,)^M/,(B,) 9/,(B,)^M/,(B,) 9/,(B, 



dhrk df^{Br) dhrk 

(d) For any given matrix M with complex entries, 

5/,(B,)^M/,(B,;) 9/,(B,)^M/,(B,) dMB,) 



3?{/,(B,,)^M}diag(a,fc) 



1 T 



dh 



ik 



dh,. 



ik 



53{/,(B,)^M}diag(/3,,) 



1 T 



(45) 



(46) 



(e) From [13|, 



dUr 



dh 



rk 



diag(l(0 < A; < L/2).22^1+ ^ 



diag(5A 



(47) 



0<i<k<j<L/2 p=i 
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(f) From inj, 



dhik 

(g) For any given matrix M, 

(h) For any given matrix M, 



diag 1(0 <fc< L/2). 22^=1+ J2 W^ip 



diag(riA 



(48) 



0<i<k<j<L/2 p=i 



dhrk 



M^ldiag((5fe) 



dh 



Idiag(nfc) 



(49) 



(50) 



ik 



In (47) and (48), I(-) represents the indicator function which is 1 if the argument is and 



otherwise. Using all of the results from equations (43) to (50), we can compute the gradient to 
be : 



dF 



dh 



rk 



diag(a,) (3ft{Z^r} - ^{Ml}f^{A,[ 



+ jdiag(afc)^>{Mo}/,(B,) - diag((5fe)M[l 



(51) 



dF 



dh 



ik 



diag(/3,)(3{Z^r} + ^>{Mo}/,(B,; 



j.diag(/3,)3?{Mn/,(B.) - diag(f2,)Mfl 



(52) 
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